Bayesian network in memory

ABSTRACT

Apparatuses and methods can be related to implementing a Bayesian neural network in a memory. A Bayesian neural network can be implemented utilizing a resistive memory array. The memory array can comprise programmable memory cells that can be programed and used to store weights of the Bayesian neural network and perform operations consistent with the Bayesian neural network.

TECHNICAL FIELD

The present disclosure relates generally to memory, and moreparticularly to apparatuses and methods associated with implementing aBayesian neural network in memory.

BACKGROUND

Memory devices are typically provided as internal, semiconductor,integrated circuits in computers or other electronic devices. There aremany different types of memory including volatile and non-volatilememory. Volatile memory can require power to maintain its data andincludes random-access memory (RAM), dynamic random access memory(DRAM), and synchronous dynamic random access memory (SDRAM), amongothers. Non-volatile memory can provide persistent data by retainingstored data when not powered and can include NAND flash memory, NORflash memory, read only memory (ROM), Electrically Erasable ProgrammableROM (EEPROM), Erasable Programmable ROM (EPROM), and resistance variablememory such as phase change random access memory (PCRAM), resistiverandom access memory (RRAM), and magnetoresistive random access memory(MRAM), among others.

Memory is also utilized as volatile and non-volatile data storage for awide range of electronic applications including, but not limited topersonal computers, portable memory sticks, digital cameras, cellulartelephones, portable music players such as MP3 players, movie players,and other electronic devices. Memory cells can be arranged into arrays,with the arrays being used in memory devices.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an apparatus in the form of a computingsystem including a memory device in accordance with a number ofembodiments of the present disclosure.

FIG. 2 illustrates an example Bayesian neural network in accordance witha number of embodiments of the present disclosure.

FIG. 3 illustrates an example memory array in accordance with a numberof embodiments of the present disclosure.

FIG. 4A illustrates a first portion of an example flow for performingforward propagation in accordance with a number of embodiments of thepresent disclosure.

FIG. 4B illustrates a second portion of an example flow for performingforward propagation in accordance with a number of embodiments of thepresent disclosure.

FIG. 5 illustrates an example flow for performing backward propagationin accordance with a number of embodiments of the present disclosure.

FIG. 6 illustrates an example flow for updating weights in accordancewith a number of embodiments of the present disclosure.

FIG. 7 illustrates an example flow diagram of a method for implementinga Bayesian neural network in memory in accordance with a number ofembodiments of the present disclosure.

FIG. 8 illustrates an example machine of a computer system within whicha set of instructions, for causing the machine to perform variousmethodologies discussed herein, can be executed.

DETAILED DESCRIPTION

The present disclosure includes apparatuses and methods related toimplementing a Bayesian neural network in memory. A Bayesian neuralnetwork can be implemented (e.g., trained and used in inference) inmemory utilizing a memory array.

As used herein, a probabilistic neural network is a feed forward neuralnetwork architecture that estimates uncertainty. A Bayesian neuralnetwork is an example of a probabilistic neural network. A Bayesianneural network is a neural network architecture with posteriorinference. A Bayesian neural network can also be a stochastic neuralnetwork. As used herein, a stochastic neural network is a neural networkarchitecture utilizing random variations such as stochastic transferfunctions, stochastic weights, and/or stochastic biases.

The examples described herein describe a neural network that implementsstochastic weights and biases and which may provide posterior inference.The examples described herein can be implemented using a Bayesian neuralnetwork.

Mission critical systems, such as medical systems or automotive systems,utilize estimation of uncertainty in neural network models. Thepredication of uncertainty can be used to assess how much to trust aforecast produced by a neural network model. For example, in healthcare,reliable uncertainty estimates can prevent overconfident decisions forrare or novel patient conditions. In autonomous agents that activelyexplore their environment, uncertainty estimates can be used to identifywhich data points are most informative. Bayesian neural networks can be,for example, used to identify an action to take with regard to steeringa vehicle. For instance, a Bayesian neural network can receive an imageof an intersection. The image can be provided as a vector of inputs tothe Bayesian neural network. The Bayesian neural network can generate aposition of a steering wheel and a certainty associated with theposition of the steering wheel.

However, traditional implementations of Bayesian neural networks mayprovide true Gaussian random variables ineffectively. Traditionalimplementation of Bayesian neural networks may also perform computationsof high-dimensional integrals inefficiently.

Aspects of the present disclosure address the above and otherdeficiencies. For instance, a number of embodiments employ hardware of amemory device to create Gaussian random variables in an efficient manneras compared to traditional implementations of Bayesian neural networks.A number of embodiments implement Bayesian neural networks in a memorydevice to utilize the hardware of the memory device to take advantage ofhigh-dimensional integrals efficiently.

A Bayesian neural network can be implemented in a memory deviceutilizing a number of memory cells and one or more stochastic pulsegenerators. The concurrent use of memory cells to implement a neuralnetwork can increase parallelism.

The figures herein follow a numbering convention in which the firstdigit or digits correspond to the drawing figure number and theremaining digits identify an element or component in the drawing.Similar elements or components between different figures may beidentified by the use of similar digits. For example, 110 may referenceelement “10” in FIG. 1 , and a similar element may be referenced as 310in FIG. 3 . Analogous elements within a Figure may be referenced with ahyphen and extra numeral or letter. See, for example, elements 442-1,442-2, 442-3, 442-4 in FIGS. 4A and 4B. As will be appreciated, elementsshown in the various embodiments herein can be added, exchanged, and/oreliminated so as to provide a number of additional embodiments of thepresent disclosure. In addition, as will be appreciated, the proportionand the relative scale of the elements provided in the figures areintended to illustrate certain embodiments of the present invention andshould not be taken in a limiting sense.

FIG. 1 is a block diagram of an apparatus in the form of a computingsystem 100 including a memory device 103 in accordance with a number ofembodiments of the present disclosure. As used herein, a memory device103, memory array 110, and/or a host 102, for example, might also beseparately considered an “apparatus”.

In this example, the computing system 100 includes a host 102 coupled tomemory device 103 via an interface 104. The computing system 100 can bea personal laptop computer, a desktop computer, a digital camera, amobile telephone, a memory card reader, or an Internet-of-Things (IoT)enabled device, among various other types of systems. Host 102 caninclude a number of processing resources (e.g., one or more processors,microprocessors, or some other type of controlling circuitry) capable ofaccessing memory 102. The computing system 100 can include separateintegrated circuits, or both the host 102 and the memory device 103 canbe on the same integrated circuit. For example, the host 102 may be asystem controller of a memory system comprising multiple memory devices103, with the system controller providing access to the respectivememory devices 103 by another processing resource such as a centralprocessing unit (CPU).

For clarity, the computing system 100 has been simplified to focus onfeatures with particular relevance to the present disclosure. The memoryarray 110 can be a DRAM array, SRAM array, STT RAM array, PCRAM array,TRAM array, RRAM array, NAND flash array, NOR flash array, and/or 3DCross-point array, for instance. The array 110 can comprise memory cellsarranged in rows coupled by access lines (which may be referred toherein as word lines or select lines) and columns coupled by sense lines(which may be referred to herein as digit lines or data lines). Althoughthe memory array 110 is shown as a single memory array, the memory array110 can represent a plurality of memory array arraigned in banks of thememory device 103.

The memory device 103 includes address circuitry 106 to latch addresssignals provided over the interface 104. The interface can include, forexample, a physical interface employing a suitable protocol (e.g., adata bus, an address bus, and a command bus, or a combineddata/address/command bus). Such protocol may be custom or proprietary,or the interface 104 may employ a standardized protocol, such asPeripheral Component Interconnect Express (PCIe), Gen-Z interconnect,cache coherent interconnect for accelerators (CCIX), or the like.Address signals are received and decoded by a row decoder 108 and acolumn decoder 112 to access the memory arrays 110. Data can be readfrom memory arrays 110 by sensing voltage and/or current changes on thesense lines using sensing circuitry 111. The sensing circuitry 111 canbe coupled to the memory arrays 110. Each memory array and correspondingsensing circuitry can constitute a bank of the memory device 103. Thesensing circuitry 111 can comprise, for example, sense amplifiers thatcan read and latch a page (e.g., row) of data from the memory array 110.The I/O circuitry 107 can be used for bi-directional data communicationwith the host 102 over the interface 104. The read/write circuitry 113is used to write data to the memory arrays 110 or read data from thememory arrays 110. As an example, the circuitry 113 can comprise variousdrivers, latch circuitry, etc.

Control circuitry 105 decodes signals provided by the host 102. Thesignals can be commands provided by the host 102. These signals caninclude chip enable signals, write enable signals, and address latchsignals that are used to control operations performed on the memoryarray 110, including data read operations, data write operations, anddata erase operations. In various embodiments, the control circuitry 105is responsible for executing instructions from the host 102. The controlcircuitry 105 can comprise a state machine, a sequencer, and/or someother type of control circuitry, which may be implemented in the form ofhardware, firmware, or software, or any combination of the three. Insome examples, the host 102 can be a controller external to the memorydevice 103. For example, the host 102 can be a memory controller whichis coupled to a processing resource of a computing device. Data can beprovided to the memory array 110 and/or from the memory array via thedata lines coupling the memory array 110 to the I/O circuitry 107.

In various instances, the memory array 110 can be a resistive memoryarray. The resistive memory array can be a resistive programmabledevice. That is, the memory array 110 can be programmed by modifying theresistance of the memory cells included in the memory array 110. Thememory cells can be programed to a specific resistance (or conductance).With respect to programming the memory cells and/or representing valueswith the memory cells, the terms resistance and conductance are usedinterchangeably herein since any change in resistance is accompanied bya proportional change in conductance. The resistance of the memory cellscan represent values that can be used in the performance of operations.For instance, the resistance of the memory cells can be used to performa multiplication operation, among other types of operations.

In various examples, the resistance of the memory cells can beprogrammed to represent weight values and bias values of a neuralnetwork. The ability to program the resistance of the memory cells cancontribute to the ability to perform forward updates, backward updates,and weight updates utilizing a limited number of banks of the memoryarray 110. The weights and the biases can be selected at randomutilizing the pulse generators 114 further described in FIG. 3 . Theresults of the operations performed at the layers of the Bayesian neuralnetwork can be converted to voltage signals utilizing theanalog-to-digital converters (ADCs) 115 further described in FIG. 3 .Although the pulse generators 114 and ADCs 115 are illustrated as beingcoupled directly to the memory array 110, in some embodiments the pulsegenerators 114 and/or ADCs 115 can be coupled to the memory array 110via the sensing circuitry 111, the row decoder 108, or the columndecoder 112.

FIG. 2 illustrates an example Bayesian neural network 220 in accordancewith a number of embodiments of the present disclosure. The Bayesianneural network 220 can comprise an input layer 221, a hidden layer222-1, a hidden layer 222-2, and an output layer 223. The hidden layers222-1, 222-2 are referred to as hidden layers 222.

The input layer 221 can receive a plurality of input values and canprovide the input values to the hidden layer 222-1. The hidden layers222 can perform a plurality of computations on the inputs andintermediary values using weights 222 (e.g., W₁, W₂) and biases 225(e.g., b₁, b₂).

Each of hidden layers 222 can comprise a different set of weights thatcan be used to perform computations of the Bayesian neural network 220.For instance, the weights W₁ represent a plurality of weight values. Theweight values corresponding the weights W₁ can include, for example, theweight values W₁₁, W₁₂, . . . , W_(1N). The biases b₁ represents aplurality of bias values. The bias values corresponding to the biases b₁can include, for example, the bias values b₁₁, b₁₂, . . . , b_(1N). FIG.2 shows the Bayesian neural network 220 as comprising two hidden layers222. However, Bayesian neural networks can comprise more than two hiddenlayers 222. The examples described herein are illustrative and can beexpanded to cover different implementations of Bayesian neural networks.

The Bayesian neural network 220 shown in FIG. 2 is referred to as afully connected Bayesian neural network 220 given that each of the nodesof each layer are connected to each of the nodes of an adjacent layer.For instance, the input layer 221 can be described as comprising aplurality of input nodes and the hidden layer 222-1 is shown to comprisea plurality of hidden nodes. Each of the input nodes can be connected toeach of the hidden nodes of the hidden layer 222-1. Likewise, each ofthe nodes of the hidden layer 222-1 is connected to each of the nodes ofthe hidden layer 222-2.

Each connection between nodes can be assigned a corresponding weightvalue and a bias value. The output of each of the nodes in the hiddenlayer 222-1 can be provided to each of the nodes in the hidden layer222-2. A number of operations can be performed utilizing the output ofthe nodes in the hidden layer 222-1, corresponding weight values fromthe weights 224, and corresponding bias values from the biases 225. Thenodes from the hidden layer 222-2 can generate an output which can beprovided as an input to the output layer 223. An output of the outputlayer 223 can be an output of the Bayesian neural network 220.

In various examples, the weights 224 and the biases 225 are randomvariables. Each of the weights 224 and each of the biases 225 isassigned a Gaussian distribution having a mean and standard deviation.For example, the weights 224 have a Gaussian distribution having a meanμ_(i) ^(W) and a standard deviation ρ_(i) ^(W). The biases 225 have aGaussian distribution having a mean μ_(i) ^(b) and a standard deviationρ_(i) ^(b), wherein i represents a layer from the layers of the Bayesianneural network such as the hidden layers 222-1. Although the weights 224are described as having a Gaussian distribution having a mean μ_(i) ^(W)and a standard deviation ρ_(i) ^(W), each of the weights 224 from theweights W₁₁, W₁₂, . . . , W_(1N) and the weights W₂₁, W₂₂, . . . ,W_(2N) can have a separate Gaussian distribution having a mean and astandard distribution. Also, each of the biases 225 from the biases b₁₁,b₁₂, . . . , b_(1N) and the biases b₂₁, b₂₂, . . . , b_(2N) can have aseparate Gaussian distribution having a mean and a standard deviation.

Each of the input layer 221, the hidden layers 222, and the output layer223 can be implemented in one or more banks of the memory array.Furthermore, a plurality of operations performed in the layers 221, 222,and 223 can be implemented using one or more banks of the memory array.

An output of the Bayesian neural network 220 can be a conditionaldistribution which is expressed as P(Ŷ|X^(new), w, D). In theconditional distribution P(Ŷ|X^(new), w, D), w is a plurality ofweights, D is training data, X^(new) is a input to a Bayesian neuralnetwork 220, and Ŷ is a result of performing the Bayesian neural network220. P(Ŷ|X^(new), w, D) can be described as an estimation of uncertaintyof a result Ŷ provided the input X^(new), the weights w, and thetraining data D.

P(Ŷ|X^(new), w, D) can be equal to Σ_(w) P(Ŷ|X^(new))P(w|D). However,calculating P(w|D) may be infeasible due to computation capabilities.P(w|D) can be approximated with q(w|θ) where θ represents the parameters{μ,ρ} such that θ={μ,ρ}.

The parameters θ can be selected to minimize F(D,θ) which is referred toas a cost function. The cost function F(D,θ) can also be referred to asa KL-divergence. A KL-divergence is a measure of how one probabilitydistribution is different from a second probability distribution. Thecost function F(D,θ) can provide a measure of how the distributionP(w|D) differs from the distribution q(w|θ) such that F(D,θ) provides adistance between P(w|D) and q(w|θ).

The cost function F(D,θ) can be defined as F(D,θ)≈Σ_(i∈layers)[logq(w^(i)|θ)−log P(w^(i))−log P(D|w^(i))] where q(w^(i)|θ) is referred toa variational posterior and P(w^(i)) is referred to as a prior and wherew^(i) denotes the ith Monte Carlo sample drawn from the variationalposterior q(w^(i)|θ). w^(i) can be selected based on a random number.P(D|w^(i)) is a likelihood that measures how well a model of theBayesian neural network 220 having the weights w fits the training dataD. The variational posterior is defined as q(w^(i)|θ)=N(θ)=N({μ^(w) ^(i), ρ^(w) ^(i) , μ^(b) ^(i) , ρ^(b) ^(i) }_(i)). The variational posterioris a Gaussian distribution. The prior is defined asP(w)=Π_(i)[πN(w_(j)|0, σ₁ ²)+(1−π)N(w_(j)|0, σ₂ ²)]. A Gaussian mixturemodel is used to model the prior dependent on the weights w, where σ₁and σ₂ are constants representing a standard deviation.

FIG. 3 illustrates an example memory array 310 in accordance with anumber of embodiments of the present disclosure. The memory array 310comprises a plurality of memory cells 333. The memory cells 333 arecoupled to sense lines 335 and access lines 336.

The memory cells 333 can be resistive memory cells. The resistive memorycells 333 can comprise terminals that couple the memory cells 333 to thesense lines 335 and the access lines 336. The terminals of the memorycells 333 can be coupled to each other via a resistive element 334. Theresistive element 334 can be a resistance variable material (e.g., amaterial programmable to multiple different resistance states, which canrepresent multiple different data states) such as, for example, atransition metal oxide material, or a perovskite including two or moremetals (e.g., transition metals, alkaline earth metals, and/or rareearth metals). Other examples of resistance variable materials that canbe included in the storage element of resistive memory cells 223 caninclude various materials employing trapped charges to modify or alterconductivity, chalcogenides formed of various doped or undopedmaterials, binary metal oxide materials, colossal magnetoresistivematerials, and/or various polymer based resistive variable materials,among others. Embodiments are not limited to a particular resistancevariable material or materials. In various instances, the conductance ofthe memory cells 333 can be programmed by programming the resistiveelement 334. For instance, control circuitry of a memory device canprogram the resistive element 334. Actions performed by a memory device,the memory array 310, the memory cells 334, a pulse generator (e.g.,deterministic pulse generators 331-1, 331-2, and stochastic pulsegenerators 332), and/or analog-to-digital converters 315-1, 315-2 can besaid to be performed by or caused by control circuitry of the memorydevice.

The conductance can represent a weight value of a Bayesian neuralnetwork. For example, the conductance of the memory cells 333 canrepresent weight values of a layer of the Bayesian neural network. Asused herein, the terms weights and weight values are usedinterchangeably.

The memory cells 333 can be used in the performance of operations. Thememory cells 333 can be controlled to perform matrix multiplication inparallel and locally to the memory device hosting the memory array 310.Matrix multiplication can be performed utilizing inputs and a pluralityof weight values. The inputs can be provided as an input vector. Theplurality of weight values, which are represented by the conductance ofthe memory cells 333, can be provided as a weight matrix. The inputs aredenoted in FIG. 3 as V_(in). V_(in) and can comprise the vector

$\begin{bmatrix}x_{0} \\ \vdots \\x_{n}\end{bmatrix}.$Each of the inputs (e.g., x₀ . . . x_(n)) can be provided to the memoryarray 310 via signal lines such as the sense lines 335 or the accesslines 336. FIG. 3 shows that the inputs can be provided via the accesslines 336 and/or the sense lines 335. Each of the access lines 336 canprovide a portion of the input (one of the input values). For example, afirst access line can provide the input value x₀, . . . , and a lastaccess line can provide the input value x_(n), where n is equal to aquantity of access lines 335 or is less than the quantity of accesslines 335. The inputs are provided by pulse generators 331 and the pulsegenerator 332.

The inputs can be multiplied with a weight matrix comprising the weightvalues stored by the memory array 310 and which are represented by theconductance of the memory cells 333. The weight matrix is denoted as

$\begin{bmatrix}w_{00} & \ldots & w_{0n} \\ \vdots & \ddots & \vdots \\w_{n0} & \ldots & w_{nn}\end{bmatrix}.$Each of the memory cells 333 can store a different weight valuerepresented its conductance.

The outputs of the matrix multiplication can be provided as an outputvector

${❘\begin{matrix}h_{0} \\ \vdots \\h_{n}\end{matrix}❘}.$Each of the outputs (e.g., h₀ . . . h_(n)) can be provided via adifferent one of the signal lines such as the sense lines 335 or theaccess lines 336. Matrix multiplication is denoted as

${❘\begin{matrix}h_{0} \\ \vdots \\h_{n}\end{matrix}❘} = {{{\begin{bmatrix}w_{00} & \ldots & w_{0n} \\ \vdots & \ddots & \vdots \\w_{n0} & \ldots & w_{nn}\end{bmatrix}^{T}\begin{bmatrix}x_{0} \\ \vdots \\x_{n}\end{bmatrix}}{or}h} = {{Wx}.}}$or h=Wx. In various examples, multiple instances of matrixmultiplication can be performed in the memory array 310. A singleinstance of matrix multiplication can also be performed in the memoryarray 310.

As such, the memory array 310 and/or banks of the memory array 310 canbe described as processing data. As used herein, processing includesutilizing memory (e.g., memory array and/or a bank of memory) togenerate an output responsive to receipt of an input. The output can begenerated using the resistance of the memory cells of the memory and theinput to the memory.

The inputs can be provided by pulse generators 331-1, 331-2, 332 (e.g.,voltage pulse generator). The pulse generators 331-1, 331-2, 332 cancomprise hardware to generate voltage pulses. In various example, thepulse generators 331-1, 331-2, 332 can receive a voltage input or aplurality of voltage inputs and can generate a plurality of voltagepulses. In some examples, the pulse generators 332 can be a stochasticrandom number generator. The pulse generator 332 can implement a dropout scheme which can be used along with the generation of random numbersto sample. The pulse generators 331-1, 331-2 can be deterministic pulsegenerators.

The outputs can be provided via the sense lines 335 or the access lines336. The outputs can be interpreted as current signals. The outputs canbe provided to analog-to-digital converters (ADCs) 315-1, 315-2. TheADCs 315-1, 315-2 can receive a current and can output a voltage. TheADC 315-1 can measure the current provided by the access lines 336. TheADC 315-2 can measure the current provided by the sense lines 335. Theoutput of the ADCs 315-1, 315-2 can be a voltage signal that can bestored in registers of the memory device or which can be provideddirectly to a voltage pulse generator coupled to a different memoryarray or a same memory array pending reprogramming of the memory array310.

For example, the memory array 310 can be used to generate an outputwhich can be converted to a voltage signal by the ADCs 315-1, 315-2. Thevoltage signal can be stored in registers of the memory device. Thememory array 310 can then be reprogramed by resetting the conductance ofthe memory cells 333. Resetting the conductance of the memory cells 333can reprogram the memory array 310 to function as a different layer ofthe Bayesian neural network. The output stored in the registers can beprovided as an input to the pulse generators 331-1, 331-2, 332 which canprovide an input to the memory array 310.

FIG. 3 shows the operation of the memory array 310 to implement a layerof a Bayesian neural network. Multiple layers of the Bayesian neuralnetwork can be implemented to forward propagate the Bayesian neuralnetwork which can result in the generation of an inference. The Bayesianneural network can also be forward propagated to prepare for backwardpropagation as described below.

The resistive components 334 can be programmed by providing inputs viathe sense lines 335 and the access lines 336. Operations can beperformed by providing inputs through one of the sense lines 335 or theaccess lines 336. Forward propagating can be performed by providinginputs through one of the sense lines 335 or the access lines 336.Backward propagating can be performed by providing inputs through theother of the sense lines 335 or the access lines 336. For example, withrespect to FIG. 3 , inputs (the x vector) can be provided via the accesslines 336 to the memory cells 333, which store the weight matrix w. Thex vector can be the output from the deterministic pulse generators331-1. The deterministic pulse generators 331-1 can be operated togenerate a known, defined output. Providing the inputs to the memorycells 333 effectively multiplies the x vector by the weight matrix w andresults in the generation of an output (the h vector). In variousexamples, the x vector can be the output from the stochastic pulsegenerators 332. The x vector can comprise random variable generated bythe stochastic pulse generators 332. The weight matrix w can be storedin the memory cells 333 using the deterministic pulse generators 331-1and the deterministic pulse generators 331-2. The deterministic pulsegenerators 331-1, 331-2 can provide inputs concurrently to store theweight matrix w in the memory cells 333. The deterministic pulsegenerators 331-1 can provide inputs via access lines 336 while thedeterministic pulse generators 331-2 provide inputs via the sense lines335.

FIGS. 4A and 4B illustrates a first portion and a second portion of anexample flow for performing forward propagation in accordance with anumber of embodiments of the present disclosure. Due to the size of thefigure, FIG. 4 is split into two pages as FIG. 4A and FIG. 4B. Forwardpropagation can be performed utilizing a plurality of banks of a memoryarray. For instance, forward propagation is shown as being performedutilizing banks 442-1, 442-2, 442-3, 442-4. FIG. 4 shows the forwardpropagation of a layer of a Bayesian neural network.

Forward propagation can include generating a plurality of weight valuesshown as w_(P(w)) ^(i) utilizing the banks 442-1, 442-2, 442-3 and thestochastic pulse generators 432-1, 432-2. The plurality of weight valuescan be utilized by the bank 442-4 to generate an output of a layer ofthe Bayesian neural network.

The bank 442-1 can store the parameters θ={μ^(W) ^(i) , ρ^(W) ^(i) ,μ^(b) ^(i) , ρ^(b) ^(i) }_(i) for a given layer of the Bayesian neuralnetwork. The parameters θ can be provided to a stochastic pulsegenerator 432-1. The stochastic pulse generator 432-1 can generate aplurality of samples from q(w^(i)|θ) utilizing the parameters θ. Theplurality of weight values sampled from q(w^(i)|θ) can be storedutilizing the memory cells of the bank 442-2 such that the weight valuesw_(q(θ)) ^(i) are stored in the memory cells of the bank 442-2. Theweight values w_(q(θ)) ^(i) can be provided to the stochastic pulsegenerator 432-2 from the memory cells of the stochastic pulse generator432-1. The stochastic pulse generator 432-2 can sample from P(w)utilizing the weight values w_(q(θ)) ^(i). The sampled weight valuesw_(P(w)) ^(i) provided by the stochastic pulse generator 432-2 can bestored in the memory cells of the memory bank 442-3. The banks 442-1,4422-2, 442-3 can be updated every training epoch, where a trainingepoch is defined as one forward and backward cycle.

The weight values w_(q(θ)) ^(i) and the weight values w_(P(w)) ^(i) canbe stored in the registers 441 of a memory device hosting the banks442-1, 442-2, 442-3, 442-4. The bank 442-4 can receive an input Xprovided by a pulse generator 431. In various examples, the input X canbe provided by a host along with instructions to generate an inferenceutilizing the Bayesian neural network.

The weight values w^(i) retrieved from the registers 441 can beprogrammed to the bank 442-4. The input X and the weight values w^(i)can be used to generate an output for a layer of the Bayesian neuralnetwork. An output for a hidden layer of the Bayesian neural network canbe denoted as h while an output of the Bayesian neural network can bedenoted as y. The pulse generator 431 can be a deterministic pulsegenerator or a stochastic pulse generator.

The control circuitry 405 can utilize the banks 442-1, 442-2, 442-3,442-4 to implement the forward propagation of a layer of the Bayesianneural network. The control circuitry 405 can control the banks 442-1,442-2, 442-3, 442-4 by programming the memory cells of the banks 442-1,442-2, 442-3, 442-4 and by controlling the pulse generators 432-1,432-2, 431 to provide inputs to the banks 442-1, 442-2, 442-3, 442-4.

FIG. 5 illustrates an example flow for performing backward propagationin accordance with a number of embodiments of the present disclosure.The control circuitry 505 can be configured to perform the backwardpropagation utilizing banks 542-1, 542-2 and registers 541-1, 541-2.

The backward propagation of a Bayesian neural network comprisescalculating a plurality of partial derivatives. For example, the partialderivative of the loss function with respect to the parameters θ can beprovided to the bank 542-2 of a memory array by a deterministic pulsegenerator. The partial derivative of the loss function can be providedas

$\frac{\partial{F\left( {D,\theta} \right)}}{\partial\theta}.$The partial derivative of the loss function with respect to theparameters θ can be provided as a vector of values such that each of thevector values is provided via one of the access lines of the bank 542-2.The input

$\frac{\partial{F\left( {D,\theta} \right)}}{\partial\theta}$to the bank 542-2 causes the generation of an output that is equal to apartial derivative of the F(w,θ) with respect to w as

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}.$The partial derivative

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}$can be stored in the registers 541-2.

The controller can provide

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}$(e.g., from the registers 541-2) to the bank 542-1. The bank 542-1 canbe programmed to store the parameters θ={μ^(W) ^(i) , ρ^(W) ^(i) , μ^(b)^(i) , ρ^(b) ^(i) }_(i). This input to the bank 542-1 causes thegeneration, using the parameters θ stored in the bank 542-1, of outputsequal to a partial derivative of F(w,θ) with respect to the mean as

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}$and the partial derivative of F(w,θ) with regards to the standarddeviation as

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}.$In various instances

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}$can be provided to the bank 542-1 to generate

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}$in a first number of operations.

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}$can be provided a second time to the bank 542-1 to generate

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}.$In other examples,

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}$and

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}$can be generated concurrently using a same number of operations.

The partial derivative

$\frac{\partial{F\left( {D,\theta} \right)}}{\partial\theta}$can be generated by the banks 542-1, 542-2 or a different bank thanthose shown here. The partial derivative

$\frac{\partial{F\left( {D,\theta} \right)}}{\partial\theta}$can also be generated by the control circuitry 505 or a host, in someexamples. In various examples, the control circuitry 505 can cause adeterministic pulse generator to provide the partial derivative

$\frac{\partial{F\left( {D,\theta} \right)}}{\partial\theta}$to the bank 541-2.

Values for

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}$and

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}$can be stored in the registers 541-1. Although the registers 541-1 and541-2 are shown as different registers, the registers 541-1 and 541-2can be a same plurality of registers or a different plurality ofregisters.

In various examples, the quantity of banks used in backward propagationof a layer of a Bayesian neural network can be less than the quantity ofbanks used in forward propagation of the layer of the Bayesian neuralnetwork. For example, the quantity of banks used to generate the partialderivatives stored in the registers 541-1 for a given layer of theBayesian neural network can be less than the quantity of banks used togeneration an inference for the layer of the Bayesian neural network.

FIG. 6 illustrates an example flow for updating weights in accordancewith a number of embodiments of the present disclosure. The banks usedto update the weights of a layer of a Bayesian neural network can befewer than the banks used to perform backward propagation which can befewer than the banks used to perform forward propagation.

A register 641-2 can provide the partial derivative

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}$and the registers 641-1 can provide the partial derivative

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}$as inputs to the bank 642-1 to update the mean of the weights w of aparticular layer of the Bayesian neural network such that

$\left. \mu\leftarrow{\mu - {a\left( {\frac{\partial{F\left( {w,\theta} \right)}}{\partial w} + \frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}} \right)}} \right.$where α is a constant. The registers 641-1 can provide the partialderivative

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}$and the registers 641-1 can provide the partial derivative

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}$as inputs to the bank 642-1 to update the standard deviation such that

$\left. \rho\leftarrow{\rho - {{a\left( {{\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}\frac{\epsilon}{1 + {\exp\left( {- \rho} \right)}}} + \frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}} \right)}.}} \right.$ϵ is a stochastic pulse. ρ is stored in the bank 642-1 along with otherparameters. A portion of

$\rho - {a\left( {{\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}\frac{\epsilon}{1 + {\exp\left( {- \rho} \right)}}} + \frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}} \right)}$can be calculated externally from the banks 642-1 and 642-2 such as byan arithmetic unit. The arithmetic unit can be internal to thecontroller 605 or external to the controller 605. The bank 642-1 canreceive the inputs through the sense lines and the access lines toupdate the values stored in the memory cells. The addition of thepartial derivatives

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w} + \frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}$represents the gradient with respect to the mean. The gradient withrespect to the standard deviation is expressed as

${{\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}\frac{\epsilon}{1 + {\exp\left( {- \rho} \right)}}} + \frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}},$wherein ϵ is a constant. In various instances y and p for the weightscan be updated concurrently or separately.

The updated parameters θ can be used to perform forward propagation ofthe corresponding layer utilizing the bank 642-1. The bank 642-2 isshown in FIG. 6 to show that a single bank 642-1 is used and to showthat the bank that stores the parameters θ for a layer of the Bayesianneural network can also be used to update the parameters θ. Although theoperations performed in the banks of the memory array are described asbeing performed utilizing four banks, the operations corresponding to alayer of the Bayesian neural network can be performed with a fewer or agreater number of banks. For example, forward propagation, backwardpropagation, and weight updating for a particular layer of the Bayesianneural network can be performed utilizing one bank, two banks, threebanks, four banks, and/or five or more banks.

FIG. 7 illustrates an example flow diagram of a method 780 forimplementing a Bayesian neural network in memory in accordance with anumber of embodiments of the present disclosure. The method 780 can beperformed by processing logic that can include hardware (e.g.,processing device, circuitry, dedicated logic, programmable logic,microcode, hardware of a device, integrated circuit, etc.), software(e.g., instructions run or executed on a processing device), or acombination thereof. In some embodiments, the method 780 is performed bythe control circuitry (e.g., controller) 105 of FIG. 1 . Although shownin a particular sequence or order, unless otherwise specified, the orderof the processes can be modified. Thus, the illustrated embodimentsshould be understood only as examples, and the illustrated processes canbe performed in a different order, and some processes can be performedin parallel. Additionally, one or more processes can be omitted invarious embodiments. Thus, not all processes are required in everyembodiment. Other process flows are possible.

At 781, a first data comprising a first partial derivative of a firstloss function of a training set and a plurality of parameters can beprovided to a first bank of a memory device, wherein the first lossfunction corresponds to a layer of a Bayesian neural network. The firstpartial derivative is provided as

$\frac{\partial{F\left( {D,\theta} \right)}}{\partial\theta}.$At 782, the first bank is utilized to generate a second data, comprisinga second partial derivative of a second loss function of a plurality ofweights and the plurality of parameter, responsive to receipt of thedata comprising the first partial derivative. The second partialderivative is provided as

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial w}.$

At 783, the second bank is utilized to generate a third data comprisinga third partial derivative of the second loss function, wherein thethird data is generated using the second data.

At 784, the second bank can be utilized to generate a fourth datacomprising a fourth partial derivative of the second loss function,wherein the fourth data is generated using the second data. The thirdpartial derivative is provided as

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\mu}$and the fourth partial derivative is provided as

$\frac{\partial{F\left( {w,\theta} \right)}}{\partial\rho}.$At 785, updated weight values of the layer of the Bayesian neuralnetwork can be written, to another portion of the memory device, basedat least in part on the third data comprising the third partialderivative and the fourth data comprising the fourth partial derivative.

The first loss function can be generated at one of a controller of thememory device and a third bank of the memory device. The first lossfunction can provide a measure of the divergence between a variationalposterior (e.g., q(w^(i)|θ)), a prior (e.g., P(w^(i))), and a measure offit (e.g., log P(D|w^(i))) between the plurality of weights and thetraining data of the Bayesian neural network. The variational posteriorand the prior can be generated using different banks of the memorydevice. For example, the variational posterior and the prior can begenerated in banks other than the first bank and the second bank.

The second partial derivative can be stored in a first plurality ofregisters of the memory device to make the second partial derivativeavailable to the second bank and available for updating the plurality ofdeterministic values (e.g., θ) of the Bayesian neural network. The thirdpartial derivative and the fourth partial derivative can be stored in asecond plurality of registers to make the third partial derivative andthe fourth partial derivative available for updating the plurality ofdeterministic values of the Bayesian neural network.

In different examples, a plurality of deterministic values can beaccessed from a first plurality of memory cells of the memory array. Afirst plurality of weight values and a first plurality of bias valuescan be generated based on the plurality of deterministic values, wherethe plurality of weight values and the plurality of bias values aregenerated using a second plurality of memory cells of the memory array.A second plurality of weight values and a second plurality of biasvalues can be generated based on the first plurality of weight valuesand the second plurality of bias values, where the second plurality ofweight values and the second plurality of bias values are generatedusing a third plurality of memory cells of the memory array. A resultand a confidence in the result can be determined utilizing an inputprovided by a host, the second plurality of weight values, and thesecond plurality of bias values of a Bayesian neural network, where theresult and the confidence is an output of a fourth plurality of memorycells of the memory array.

The deterministic values can comprise a weight mean, a weight standarddeviation, a bias mean, and a bias standard deviation of a plurality oflayers, implemented in the memory array, of the Bayesian neural network.A controller of a memory device can access the plurality ofdeterministic values, generate the first plurality of weight values andthe first plurality of bias values, generate the second plurality ofweight values and the second plurality of bias values, and determine theresult and the confidence in the result utilizing a plurality of banks.The plurality of banks can comprise the first plurality of memory cells,the second plurality of memory cells, the third plurality of memorycells, and the fourth plurality of memory cells. For example, the firstplurality of memory cells can comprise a first bank, the secondplurality of memory cells can comprise a second bank, third plurality ofmemory cells can comprise a third bank, and the fourth plurality ofmemory cells can comprise a fourth bank. The first bank, the secondbank, the third bank, and the fourth bank can comprise a layer of theBayesian neural network.

In various examples, the first plurality of memory cells, the secondplurality of memory cells, and the third plurality of memory cells cancomprise a first bank and the fourth plurality of memory cells cancomprise a second bank. The first bank and the second bank can comprisea layer of the Bayesian neural network.

In various instances, the first plurality of weight values, the firstplurality of bias values, the second plurality of weight values, and thesecond plurality of bias values of the Bayesian neural network can besampled utilizing a number of Bayesian pulse generators coupled to thememory array. The first plurality of weight values and the firstplurality of bias values of the Bayesian neural network can be sampledutilizing a first stochastic pulse generator of the number of stochasticpulse generators. A second plurality of weight values of the Bayesianneural network can also be sampled utilizing a second stochastic pulsegenerator of the number of stochastic pulse generators.

The number of stochastic pulse generators can be controlled to provide afirst plurality of voltage pulses to a first plurality of signal linesto which the second plurality of memory cells are coupled to sample thefirst plurality of weight values and the first plurality of bias values.The stochastic pulse generators can also be configured to provide asecond plurality of voltage pulses to a second plurality of signal linesto which the third plurality of memory cells are coupled to sample thesecond plurality of weight values and the second plurality of biasvalues.

The first plurality of voltage pulses can cause a first plurality ofcurrents to be emitted from the second plurality of memory cells basedon a resistance of the second plurality of memory cells. The secondplurality of voltage pulses can cause a second plurality of currents tobe emitted from the third plurality of memory cells based on aresistance of the third plurality of memory cells.

The second plurality of memory cells can be coupled to a first pluralityof different signal lines, where the third plurality of memory cells arecoupled to a second plurality of different signal lines. The firstplurality of currents can be provided to an analog-to-digital convertervia the first plurality of different signal lines. The first pluralityof currents can represent the first plurality of weight values and thefirst plurality of bias values. The second plurality of currents can beprovided to a different analog-to-digital converter via the secondplurality of different signal lines, wherein the second plurality ofcurrents represent the second plurality of weight values and the secondplurality of bias values of the Bayesian neural network.

The analog-to-digital converter can generate a first plurality of outputvoltages corresponding to the first plurality of currents. A secondplurality of output voltages corresponding to the second plurality ofcurrents can be generated, utilizing the different analog-to-digitalconverter. The second plurality of output voltages represent the secondplurality of weight values and the second plurality of bias values thatare sampled.

The plurality of registers can store a first plurality of valuescorresponding to the first plurality of output voltages and a secondplurality of values corresponding to the second plurality of outputvoltages to make the first plurality of values and the second pluralityof values available to the different stochastic pulse generator and thefourth plurality of memory cells.

Various examples can implement a system comprising a controller coupledto a memory device and a plurality of register. The controller can beconfigured to access a first partial derivative of a loss function withregards to a mean of a plurality of deterministic values of a Bayesianneural network from the plurality of registers, access a second partialderivative of the loss function with regards to a standard deviation ofthe plurality of deterministic values from the plurality of registers,access a third partial derivative of the loss function with regards to aplurality of weight values of the Bayesian network from the plurality ofregisters, and updating a resistance of the plurality of memory cells tostore an updated plurality of deterministic values. The resistance ofthe plurality of memory cells can be updated using the first partialderivative, the second partial derivative, and the third partialderivative.

A first plurality of voltages corresponding to the first partialderivative and the second partial derivative can be applied to a firstplurality of signal lines. A second plurality of voltages correspondingto the third partial derivative can be applied to a second plurality ofsignal lines, where each of the plurality of memory cells are coupled torespective ones of the first plurality of signal lines and the secondplurality of signal lines. Applying the first plurality of voltage andthe second plurality of voltage can update the resistance of theplurality of memory cells. The resistance of the plurality of memorycells can represent a plurality of deterministic values of the Bayesianneural network.

FIG. 8 illustrates an example machine of a computer system 890 withinwhich a set of instructions, for causing the machine to perform variousmethodologies discussed herein, can be executed. In various embodiments,the computer system 890 can correspond to a system (e.g., the computingsystem 100 of FIG. 1 ) that includes, is coupled to, or utilizes amemory sub-system (e.g., the memory device 103 of FIG. 1 ) or can beused to perform the operations of a controller (e.g., the controllercircuitry 105 of FIG. 1 ). In alternative embodiments, the machine canbe connected (e.g., networked) to other machines in a LAN, an intranet,an extranet, and/or the Internet. The machine can operate in thecapacity of a server or a client machine in client-server networkenvironment, as a peer machine in a peer-to-peer (or distributed)network environment, or as a server or a client machine in a cloudcomputing infrastructure or environment.

The machine can be a personal computer (PC), a tablet PC, a set-top box(STB), a Personal Digital Assistant (PDA), a cellular telephone, a webappliance, a server, a network router, a switch or bridge, or anymachine capable of executing a set of instructions (sequential orotherwise) that specify actions to be taken by that machine. Further,while a single machine is illustrated, the term “machine” shall also betaken to include any collection of machines that individually or jointlyexecute a set (or multiple sets) of instructions to perform any one ormore of the methodologies discussed herein.

The example computer system 890 includes a processing device 891, a mainmemory 893 (e.g., read-only memory (ROM), flash memory, dynamic randomaccess memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM(RDRAM), etc.), a static memory 897 (e.g., flash memory, static randomaccess memory (SRAM), etc.), and a data storage system 899, whichcommunicate with each other via a bus 897.

Processing device 891 represents one or more general-purpose processingdevices such as a microprocessor, a central processing unit, or thelike. More particularly, the processing device can be a complexinstruction set computing (CISC) microprocessor, reduced instruction setcomputing (RISC) microprocessor, very long instruction word (VLIW)microprocessor, or a processor implementing other instruction sets, orprocessors implementing a combination of instruction sets. Processingdevice 891 can also be one or more special-purpose processing devicessuch as an application specific integrated circuit (ASIC), a fieldprogrammable gate array (FPGA), a digital signal processor (DSP),network processor, or the like. The processing device 891 is configuredto execute instructions 892 for performing the operations and stepsdiscussed herein. The computer system 890 can further include a networkinterface device 895 to communicate over the network 896.

The data storage system 899 can include a machine-readable storagemedium 889 (also known as a computer-readable medium) on which is storedone or more sets of instructions 892 or software embodying any one ormore of the methodologies or functions described herein. Theinstructions 892 can also reside, completely or at least partially,within the main memory 893 and/or within the processing device 891during execution thereof by the computer system 890, the main memory 893and the processing device 891 also constituting machine-readable storagemedia.

In one embodiment, the instructions 892 include instructions toimplement functionality corresponding to the host 102 and/or the memorydevice 103 of FIG. 1 . While the machine-readable storage medium 889 isshown in an example embodiment to be a single medium, the term“machine-readable storage medium” should be taken to include a singlemedium or multiple media that store the one or more sets ofinstructions. The term “machine-readable storage medium” shall also betaken to include any medium that is capable of storing or encoding a setof instructions for execution by the machine and that cause the machineto perform any one or more of the methodologies of the presentdisclosure. The term “machine-readable storage medium” shall accordinglybe taken to include, but not be limited to, solid-state memories,optical media, and magnetic media.

As used herein, “a number of” something can refer to one or more of suchthings. For example, a number of memory devices can refer to one or morememory devices. A “plurality” of something intends two or more.Additionally, designators such as “N,” as used herein, particularly withrespect to reference numerals in the drawings, indicates that a numberof the particular feature so designated can be included with a number ofembodiments of the present disclosure.

The figures herein follow a numbering convention in which the firstdigit or digits correspond to the drawing figure number and theremaining digits identify an element or component in the drawing.Similar elements or components between different figures may beidentified by the use of similar digits. As will be appreciated,elements shown in the various embodiments herein can be added,exchanged, and/or eliminated so as to provide a number of additionalembodiments of the present disclosure. In addition, the proportion andthe relative scale of the elements provided in the figures are intendedto illustrate various embodiments of the present disclosure and are notto be used in a limiting sense.

Although specific embodiments have been illustrated and describedherein, those of ordinary skill in the art will appreciate that anarrangement calculated to achieve the same results can be substitutedfor the specific embodiments shown. This disclosure is intended to coveradaptations or variations of various embodiments of the presentdisclosure. It is to be understood that the above description has beenmade in an illustrative fashion, and not a restrictive one. Combinationsof the above embodiments, and other embodiments not specificallydescribed herein will be apparent to those of skill in the art uponreviewing the above description. The scope of the various embodiments ofthe present disclosure includes other applications in which the abovestructures and methods are used. Therefore, the scope of variousembodiments of the present disclosure should be determined withreference to the appended claims, along with the full range ofequivalents to which such claims are entitled.

In the foregoing Detailed Description, various features are groupedtogether in a single embodiment for the purpose of streamlining thedisclosure. This method of disclosure is not to be interpreted asreflecting an intention that the disclosed embodiments of the presentdisclosure have to use more features than are expressly recited in eachclaim. Rather, as the following claims reflect, inventive subject matterlies in less than all features of a single disclosed embodiment. Thus,the following claims are hereby incorporated into the DetailedDescription, with each claim standing on its own as a separateembodiment.

What is claimed is:
 1. An apparatus, comprising: a memory array; acontroller coupled to the memory array and configured to: read data froma first plurality of memory cells of the memory array; generate, using asecond plurality of memory cells of the memory array, a first pluralityof weight values and a first plurality of bias values based on aplurality of deterministic values read from the first plurality ofmemory cells; generate, using a third plurality of memory cells of thememory array, a second plurality of weight values and a second pluralityof bias values based on the first plurality of weight values and thesecond plurality of bias values; and transmit output data from a fourthplurality of memory cells of the memory array, the output datacomprising a result and a confidence in the result that is based atleast on: an input provided by a host, the second plurality of weightvalues, and the second plurality of bias values of a Bayesian neuralnetwork.
 2. The apparatus of claim 1, wherein the deterministic valuescomprise a weight mean, a weight standard deviation, a bias mean, and abias standard deviation of a plurality of layers, implemented in thememory array, of the Bayesian neural network.
 3. The apparatus of claim2, wherein the controller is further configured to: access the pluralityof deterministic values, generate the first plurality of weight valuesand the first plurality of bias values, generate the second plurality ofweight values and the second plurality of bias values, and determine theresult and the confidence in the result utilizing a plurality of banks;wherein the plurality of banks comprise the first plurality of memorycells, the second plurality of memory cells, the third plurality ofmemory cells, and the fourth plurality of memory cells.
 4. The apparatusof claim 3, wherein: the first plurality of memory cells comprises afirst bank, the second plurality of memory cells comprises a secondbank, third plurality of memory cells comprises a third bank, and thefourth plurality of memory cells comprises a fourth bank; and the firstbank, the second bank, the third bank, and the fourth bank comprise alayer of the Bayesian neural network.
 5. The apparatus of claim 3,wherein: the first plurality of memory cells, the second plurality ofmemory cells, and the third plurality of memory cells comprise a firstbank and the fourth plurality of memory cells comprises a second bank;and the first bank and the second bank comprise a layer of the Bayesianneural network.
 6. The apparatus of claim 1, wherein the controller isfurther configured to sample the first plurality of weight values, thefirst plurality of bias values, the second plurality of weight values,and the second plurality of bias values of the Bayesian neural networkutilizing a number of stochastic pulse generators coupled to the memoryarray.
 7. The apparatus of claim 6, wherein the controller is furtherconfigured to sample the first plurality of weight values and the firstplurality of bias values of the Bayesian neural network utilizing afirst stochastic pulse generator of the number of stochastic pulsegenerators and a second plurality of weight values of the Bayesianneural network utilizing a second stochastic pulse generator of thenumber of stochastic pulse generators.
 8. The apparatus of claim 6,wherein the controller is configured to control the number of stochasticpulse generators to: provide a first plurality of voltage pulses to afirst plurality of signal lines to which the second plurality of memorycells are coupled to sample the first plurality of weight values and thefirst plurality of bias values; and provide a second plurality ofvoltage pulses to a second plurality of signal lines to which the thirdplurality of memory cells are coupled to sample the second plurality ofweight values and the second plurality of bias values.
 9. The apparatusof claim 8, wherein: the first plurality of voltage pulses cause a firstplurality of currents to be emitted from the second plurality of memorycells based on a resistance of the second plurality of memory cells; andthe second plurality of voltage pulses cause a second plurality ofcurrents to be emitted from the third plurality of memory cells based ona resistance of the third plurality of memory cells.
 10. The apparatusof claim 9, wherein the second plurality of memory cells are coupled toa first plurality of different signal lines; wherein the third pluralityof memory cells are coupled to a second plurality of different signallines; and wherein the controller is further configured to control thememory array to: provide the first plurality of currents via the firstplurality of different signal lines to an analog-to-digital converter,wherein the first plurality of currents represent the first plurality ofweight values and the first plurality of bias values; and provide thesecond plurality of currents via the second plurality of differentsignal lines to a different analog-to-digital converter, wherein thesecond plurality of currents represent the second plurality of weightvalues and the second plurality of bias values of the Bayesian neuralnetwork.
 11. The apparatus of claim 10, wherein the controller isfurther configured to: generate, utilizing the analog-to-digitalconverter, a first plurality of output voltages corresponding to thefirst plurality of currents; generate, utilizing the differentanalog-to-digital converter, a second plurality of output voltagescorresponding to the second plurality of currents, wherein the secondplurality of output voltages represent the second plurality of weightvalues and the second plurality of bias values that are sampled.
 12. Theapparatus of claim 11, further comprising a plurality of registers andwherein the controller is further configured to: store a first pluralityof values corresponding to the first plurality of output voltages and asecond plurality of values corresponding to the second plurality ofoutput voltages in the plurality of registers to make the firstplurality of values and the second plurality of values available to thedifferent stochastic pulse generator and the fourth plurality of memorycells.
 13. A method, comprising: providing a first data comprising afirst partial derivative of a first loss function of a training set anda plurality of parameters to a first bank of a memory device, whereinthe first loss function corresponds to a layer of a Bayesian neuralnetwork; utilizing the first bank to generate a second data, comprisinga second partial derivative of a second loss function of a plurality ofweights and the plurality of parameter, responsive to receipt of thedata comprising the first partial derivative; utilizing the second bankto generate a third data comprising a third partial derivative of thesecond loss function, wherein the third data is generated using thesecond data; utilizing the second bank to generate a fourth datacomprising a fourth partial derivative of the second loss function,wherein the fourth data is generated using the second data; and writing,to another portion of the memory device, updated weight values of thelayer of the Bayesian neural network based at least in part on the thirddata comprising the third partial derivative and the fourth datacomprising the fourth partial derivative.
 14. The method of claim 13,further comprising: generating the first loss function at one of acontroller of the memory device and a third bank of the memory device;and wherein the first loss function provides a measure of the divergencebetween a variational posterior, a prior, and a measure of fit betweenthe plurality of weights and the training data of the Bayesian neuralnetwork.
 15. The method of claim 14, further comprising generating thevariational posterior and the prior in different banks of the memorydevice.
 16. The method of claim 13, further comprising storing thesecond data comprising the second partial derivative in a firstplurality of registers of the memory device to make the second partialderivative available to the second bank and available for updating theplurality of deterministic values of the Bayesian neural network. 17.The method of claim 16, further comprising storing the third datacomprising the third partial derivative and the fourth data comprisingthe fourth partial derivative in a second plurality of registers to makethe third partial derivative and the fourth partial derivative availablefor updating the plurality of deterministic values of the Bayesianneural network.
 18. A system, comprising: a memory device; a pluralityof registers; a controller coupled to the memory device and theplurality of registers and configured to: access a first datarepresenting a first partial derivative of a loss function of a Bayesianneural network from the plurality of registers; access a second datarepresenting a second partial derivative of the loss function from theplurality of registers; access a third data representing a third partialderivative of the loss function from the plurality of registers; andupdate a resistance of the plurality of memory cells, to store anupdated plurality of deterministic values, utilizing the first data, thesecond data, and the third data.
 19. The system of claim 18, wherein thecontroller configured to update the resistance of the plurality ofmemory cells is further configured to: apply a first plurality ofvoltages corresponding to the first partial derivative and the secondpartial derivative to a first plurality of signal lines; and apply asecond plurality of voltages corresponding to the third partialderivative to a second plurality of signal lines, wherein the each ofthe plurality of memory cells are coupled to respective ones of thefirst plurality of signal lines and the second plurality of signal linesand wherein applying the first plurality of voltage sand the secondplurality of voltage updates the resistance of the plurality of memorycells.
 20. The system of claim 18, wherein the resistance of theplurality of memory cells represents a plurality of deterministic valuesof the Bayesian neural network.